An observation on Generalization: 1 hr talk by Ilya Sutskever, OpenAI’s Chief scientist. He’s previously talked about how compression may be all you need for intelligence. In this lecture, he builds on the ideas of Kolmogorov complexity and how neural networks are implicitly seeking for simplicity in the representations that they learn. He provides a clarity of thought that is rarely seen in the industry around generalization of these novel systems.

Brendan Bycroft wrote a well-done step-by-step visualization of how an LLM works

Welcome to the walkthrough of the GPT large language model! Here we’ll explore the model nano-gpt, with a mere 85,000 parameters.

Its goal is a simple one: take a sequence of six letters:

C B A B B C and sort them in alphabetical order, i.e. to “ABBBCC”.

Bycroft’s Visual Step-by-Step Description of an LLM